Efficient processing of similarity search under time warping in sequence databases: an index-based approach
نویسندگان
چکیده
This paper discusses an effective processing of similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. They have to scan all the database, thus suffer from serious performance degradation in large databases. Another method that hires the suffix tree, which does not assume any distance function, also shows poor performance due to the large tree size. In this paper, we propose a novel method for similarity search that supports time warping. Our primary goal is to enhance the search performance in large databases without permitting any false dismissal. To attain this goal, we devise a new distance function Dtw−lb that consistently underestimates the time warping distance and also satisfies the triangular inequality. Dtw−lb uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes and Dtw−lb as a distance function. We prove that our method does not incur false dismissal. To verify the superiority of our method, we perform extensive experiments. The results reveal that our method achieves significant speedup up to 43 times with a data set containing real-world S&P 500 stock data sequences and up to 720 times with data sets containing a very large volume of synthetic data sequences. The performance gain becomes larger: (1) as the number of data sequences gets larger, (2) the average length of data sequences gets longer, and (3) as the tolerance in a query gets smaller. Considering the characteristics of real databases, these tendencies imply that our approach is suitable for practical applications.
منابع مشابه
An Index - Based Approach for Similarity Search Supporting TimeWarping in Large Sequence
This paper discusses an eeective processing of similarity search that supports time warping in large sequence databases. Time warping enables nding sequences with similar patterns even when they are of diierent lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance doe...
متن کاملSBASS: Segment based approach for subsequence searches in sequence databases
The sequence database is a set of data sequences, each of which is an ordered list of elements [1]. Sequences of stock prices, money exchange rates, temperature data, product sales data, and company growth rates are the typical examples of sequence databases [2, 8]. Similarity search is an operation that finds sequences or subsequences whose changing patterns are similar to that of a given quer...
متن کاملSimilarity search of time-warped subsequences via a suffix tree
This paper proposes an indexing technique for fast retrieval of similar subsequences using the time warping distance. The time warping distance is a more suitable similarity measure than the Euclidean distance in many applications where sequences may be of different lengths and/or different sampling rates. The proposed indexing technique employs a disk-based suffix tree as an index structure an...
متن کاملSegment - Based Approach for Subsequence Searches in SequenceDatabasesSanghyun
This paper deals with the subsequence searching problem under time-warping in sequence databases. Our work is motivated by the observation that subsequence searches slow down quadratically as the average length of data sequences increases. To resolve this problem, the Segment-Based Approach for Subsequence Searches (SBASS) is proposed. The SBASS divides data and query sequences into a series of...
متن کاملSimilarity-Based Subsequence Search in Image Sequence Databases
This paper proposes an indexing technique for fast retrieval of similar image subsequences using the multi-dimensional time warping distance. The time warping distance is a more suitable similarity measure as compared to the Lp distance in many applications where sequences may be of different lengths and/or different sampling rates. Our indexing scheme employs a disk-based suffix tree as an ind...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Syst.
دوره 29 شماره
صفحات -
تاریخ انتشار 2004